Closed
Bug 1442893
Opened 7 years ago
Closed 7 years ago
Intermittent talos Aborting task - max run time exceeded!
Categories
(Testing :: Talos, enhancement)
Tracking
(firefox61 fixed)
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox61 | --- | fixed |
People
(Reporter: aryx, Assigned: jmaher)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])
Attachments
(1 file)
862 bytes,
patch
|
rwood
:
review+
|
Details | Diff | Splinter Review |
Hit a central-as-beta run: https://treeherder.mozilla.org/logviewer.html#?job_id=165748071&repo=try
New bug because bug 1420078 got closed.
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 3•7 years ago
|
||
Bug 1420394 is about reftests.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 10•7 years ago
|
||
There are 69 failures in the past week.
Platforms: most of the occurrences are on OS X 10.10 opt and debug, but we also have some on Windows 7 opt and pgo, windows2012-32 opt and windows10-64 pgo.
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=173114870
:rwood Can you please take a look at this?
Flags: needinfo?(rwood)
Whiteboard: [stockwell needswork]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 15•7 years ago
|
||
There are 62 failures in the last 7 days.
They occur mostly on OS X 10.10, Windows 7, windows10-64, linux64-ccov, macosx64-nightly.
The affected builds type are: debug, opt, pgo.
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=174949491
Aborting task - max run time exceeded!
[taskcluster 2018-04-21T16:31:18.261Z] Exit Code: -1
[taskcluster 2018-04-21T16:31:18.261Z] === Task Finished ===
[taskcluster 2018-04-21T16:31:18.261Z] Task Duration: 29m58.755625774s
[taskcluster 2018-04-21T16:31:18.872Z] Uploading artifact public/logs/localconfig.json from file logs/localconfig.json with content encoding "gzip", mime type "application/json" and expiry 2019-04-21T15:31:41.513Z
[taskcluster 2018-04-21T16:31:19.368Z] Uploading artifact public/logs/talos_critical.log from file logs/talos_critical.log with content encoding "gzip", mime type "text/plain" and expiry 2019-04-21T15:31:41.513Z
[taskcluster 2018-04-21T16:31:19.833Z] Uploading artifact public/logs/talos_error.log from file logs/talos_error.log with content encoding "gzip", mime type "text/plain" and expiry 2019-04-21T15:31:41.513Z
[taskcluster 2018-04-21T16:31:20.286Z] Uploading artifact public/logs/talos_fatal.log from file logs/talos_fatal.log with content encoding "gzip", mime type "text/plain" and expiry 2019-04-21T15:31:41.513Z
[taskcluster 2018-04-21T16:31:20.662Z] Uploading artifact public/logs/talos_info.log from file logs/talos_info.log with content encoding "gzip", mime type "text/plain" and expiry 2019-04-21T15:31:41.513Z
[taskcluster 2018-04-21T16:31:21.275Z] Uploading artifact public/logs/talos_raw.log from file logs/talos_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2019-04-21T15:31:41.513Z
[taskcluster 2018-04-21T16:31:21.844Z] Uploading artifact public/logs/talos_warning.log from file logs/talos_warning.log with content encoding "gzip", mime type "text/plain" and expiry 2019-04-21T15:31:41.513Z
[taskcluster 2018-04-21T16:31:22.207Z] Uploading artifact public/test_info/h1-e10s_errorsummary.log from file build/blobber_upload_dir/h1-e10s_errorsummary.log with content encoding "gzip", mime type "text/plain" and expiry 2019-04-21T15:31:41.513Z
[taskcluster 2018-04-21T16:31:22.578Z] Uploading artifact public/test_info/h1-e10s_raw.log from file build/blobber_upload_dir/h1-e10s_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2019-04-21T15:31:41.513Z
[taskcluster 2018-04-21T16:31:22.668Z] Task not successful due to following exception(s):
[taskcluster 2018-04-21T16:31:22.668Z] Exception 1)
[taskcluster 2018-04-21T16:31:22.668Z] signal: killed
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 17•7 years ago
|
||
this seems to have slowed down, osx is our biggest problem, :rwood, when you look at this, lets just focus on the osx failures.
Comment 18•7 years ago
|
||
Noticed this happens quite often on ts_paint_heavy, here's a range: http://tinyurl.com/y82fsua6
And a failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=175391012&repo=mozilla-inbound
Comment hidden (Intermittent Failures Robot) |
Comment 20•7 years ago
|
||
Update:
There have been 37 failures in the last 7 days.
All failures occur on OS X 10.10 / opt with 1 exception for OS X 10.10 / debug.
Recent log file:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=176646993
Summary: Intermittent talos Aborting task - max run time exceeded!
Assignee | ||
Comment 21•7 years ago
|
||
this is really confusing as the h1 job (large majority of failures) can complete in 5 minutes often, and then sometimes it takes longer- if we said the average runtime is 10 minutes- having a 30 minute max runtime seems like plenty of time. Why is it that we run slower?
looking at logs it seems that we spend about 20 minutes unpacking the heavy profile, for the jobs which pass we used a cached version of the profile:
01:42:50 INFO - Initialising browser for ts_paint_heavy test...
01:42:55 INFO - Local copy of 'simple' is fresh enough
01:42:55 INFO - 3 days old
My conclusion is that we need to accept the fact that the average runtime is 5 minutes and that profile download+extraction will add an additional 10-30 minutes to the process. So should we adjust the maxruntime to 40 minutes?
looking at the value we get from heavy vs plain:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=31536000&series=mozilla-inbound,1640692,1,1&series=mozilla-inbound,1640641,1,1
I see we post numbers ~5% higher for heavy- but the pattern is the same and noise levels are similar- in short we are not seeing anything unique from a heavy profile on osx.
I vote to disable this test on osx as we have data for win7/win10/linux64.
Assignee | ||
Comment 22•7 years ago
|
||
we could either:
1) extend the timeout
2) do #1 and restrict to try/m-c
3) disable it
I chose 3 to reduce confusion and to focus our efforts on tests that provide value. If you would prefer I could delete the line instead of comment it out.
Assignee: nobody → jmaher
Status: REOPENED → ASSIGNED
Flags: needinfo?(rwood)
Attachment #8972815 -
Flags: review?(rwood)
Comment 23•7 years ago
|
||
Comment on attachment 8972815 [details] [diff] [review]
disable h1 on osx
Review of attachment 8972815 [details] [diff] [review]:
-----------------------------------------------------------------
I completely agree, this test has always been causing issues unpacking the heavy profile. As well as Win it's also running on Linux still anyway. I vote to just remove it permanently.
Attachment #8972815 -
Flags: review?(rwood) → review+
Comment 24•7 years ago
|
||
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/64703ce328ea
disable ts_paint_heavy on osx due to length of time to unpack profile. r=rwood
Assignee | ||
Updated•7 years ago
|
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]
Comment 25•7 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 7 years ago → 7 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Comment hidden (Intermittent Failures Robot) |
Comment 27•7 years ago
|
||
This doesnt seem to be fixed.
There are 35 failures in the last 7 days.
Last failure in OF 11 May 2018, 05:36: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=178002803
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 28•7 years ago
|
||
this is mostly:
* osx tp6 (I think a bad machine)
* test-verify on android
* osx-qr reftests
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 30•7 years ago
|
||
Setting this to fixed and telling people to use bug 1439979 for new occurrences.
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•